On self-similar sets with overlaps and 
inverse theorems for entropy 



Michael Hochman 



Abstract 



We study the Hausdorff dimension of self-similar sets and measures on M. We 
show that if the dimension is smaller than the minimum of 1 and the similarity 
dimension, then at small scales there are super-exponentially close cylinders. This 
is a step towards the folklore conjecture that such a drop in dimension is explained 
only by exact overlaps, and confirms the conjecture in cases where the contraction 
parameters are algebraic. It also gives an affirmative answer to a conjecture of 
Furstenberg, showing that the projections of the "I-dimensional Sierpinski gasket" 
in irrational directions are all of dimension 1 . 

As another consequence, when a family of self-similar sets or measures is 
parametrized in a real-analytic manner, then, under an extremely mild non- 
degeneracy condition, the set of "exceptional" parameters has Hausdorff dimen- 
sion 0. Thus, for example, there is at most a zero-dimensional set of parameters 
1/2 < A < 1 such that the corresponding Bernoulli convolution has dimension 
< 1, and similarly for Sinai's problem on iterated function systems that contract 
on average. 

A central ingredient of the proof is an inverse theorem for the growth of entropy 
of convolutions of probability measures. For the dyadic partition I?„ of R into cells 
of side 2~", we show that if (i/* /i, I?„) < ^H{fj,,'Dn)+5, then, when restricted 
to random element of a partition "Di, 1 < i < n, either is close to uniform or v is 
close to atomic. This should be compared to results in additive combinatorics that 
give the global structure of measures satisfying ^iJ I?„) < l?„)-|-0(i). 

1 Introduction 

1.1 Self-similar sets and measures 

In this paper an iterated function system (IFS) wih mean a finite family $ = {(^jjjgA of 
linear contractions of R, ipi{x) = riX + ai with |rj| < 1 and Oj G M. To avoid trivialities 
we assume throughout that there are at least two distinct contractions. A self similar 
set is the attractor of such a system, i.e. the unique compact set / X C M satisfying 



The self-similar measure associated to a probability vector (pi)ieA is the unique Borel 





probability measure ^ on 
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Here = /i o 99^^ denotes the push-forward of fi by ip. 

When the images ipiX are disjoint or satisfy various weaker separation assumptions, 
the small-scale structure of these objects is well understood, and in particular the 
Hausdorff dimension dimX of X is equal to the similarity dimensioij^ s-dimX, i.e. the 
unique solution s > of the equation X] ~ ^- Defining the dimension of a measure 

dime = inf{dim^ : e{E) > 0} 

and assuming again sufficient separation of the images ipiX, the dimension dim/i of a 
self-similar measure is equal to the similarity dimension of fi, defined by 

EPilogPi 

s-dnxL // — 



J2 Pi log n ' 

When the images ipiX have significant overlap, however, far less is known about 
the structure, or even the dimension, of these objects. One can give trivial bounds: the 
dimension is never greater than the similarity dimension, and it is never greater than 
the dimension of the ambient space M, which is 1. Hence 

dimX < min{l, s-dimX} (3) 
dim// < min{l, s-dim (4) 

However, without special combinatorial assumptions on the IFS, current methods are 
unable even to decide whether or not equality holds in (j3|) and (j4l), let alone compute 
the dimension exactly. The exception is when there are sufficiently many exact overlaps 
among the "cylinders" of the IFS. More precisely, for i = ii . . . in € A" write 

One says that exact overlaps occur if there is an n and distinct i^j S A" such that 
(fi = ipj (in particular the images ipiX and (fjX coincide) Jf] If this occurs then X and 
H can be expressed using an IFS ^ which is a proper subset of {<^i}igA") and a strict 
inequality in ([3]) and may follow from the corresponding bound for ^. 

1.2 Main results 

This work was motivated by the folklore conjecture that the occurrence of exact overlaps 
is the only mechanism which can lead to a strict inequality in ([3| and (|4| (see e.g. 
|24| question 2.6]). Our main result lends some support to the conjecture and proves 
some special cases of it. All of our results hold, with suitable modifications, in higher 
dimensions, but this will appear separately. 

Fix $ = {(/?,} jgA as in the previous section and for i G A" write ri = ri-^ ■ . . . ■ ri^, 
which is the contraction ratio of (pi. Define the distance between the cylinders associated 



^This notation is imprecise, since the similarity dimension depends on the IFS "I> rather than the 
attractor X, but the meaning should always be clear from the context. A similar remark holds for the 
similarity dimension of measures. 

■^This is the lower Hausdorff dimension. There are many other notions of dimension but for self- 
similar measures all the major ones coincide since such measures are exact dimensional ■Bi- 

■^One should note that if i £ A*, j £ A™ and tpi — (pj, then i cannot be a prefix of j and vice versa, 
so £ are distinct and ipij = (pji. This shows that our definition of exact overlaps includes 

coincidence of cylinders at "different generations". 
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to i,j e A" by 

oo rj 7^ rj 

I 1^.(0) -y., (0)1 r, = r, • 

Note that d{i,j) = if and only if ipi = ipj and that the definition is unchanged if is 
replaced by any other point. For n G N let 

An = mm{d{i,j) : j G A" , i / j}. 

Exact overlaps occur if and only if A„, = for some n (equivalently all sufficiently 
large n). One also always has exponential decay of A„. Indeed, all of the points (pi{0), 
i G A", belong to a fixed bounded interval (independent of n), and the exponentially 
many sequences i G A"" give rise to only polynomially many contraction ratios rj. 
Therefore there are distinct i,j G A" with = rj and |99j(0) — 93j(0)| < |A|^(-'^^°(-^))", 
which implies that A„ — )• exponentially. In general this is all one can say, since in 
many cases there is an exponential lower bound for A„. Such a lower bound occurs 
when the images ipi{X) are disjoint but also sometimes when they intersect, for instance 
in Garsia's example from and the examples considered in Theorems 11.51 and 11.61 
below. 

Theorem 1.1. If fi is a self-similar measure on M and if dim fi < min{l, s-dim 
then An — )• super- exponentially ^ i.e. lim( — ^logA^) — oc. 

Note that the conclusion is in terms of the sequence A„,, which is determined by 
the IFS not the measure. Thus if the conclusion fails, the hypothesis must fail for 
all self-similar measures of Every self-similar set X supports a self-similar measure 
H with s-dim/x = s-dimX, and since we always have dim/i < dimX, we conclude: 

Corollary 1.2. If X is the attractor of an IFS on M and if dimX < min{l, s-dim X}, 
then lim(— ^ log A„) = c«. 

Theorem 11.11 is derived from a more quantitative result about the entropy of finite 
approximations of fi. Write H{^,8) for the Shannon entropy of a measure /i with 
respect to a partition and H{fi,£\J^) for the conditional entropy on J^; see Section 
13.11 For n G Z the dyadic partitions of M into intervals of length 2~" is 

Vn = {[—,'^) : kez}. 

For t G M we also write T>t = T^[t]- We remark that liminf ^log H{9,'Dn) > dim^ for 
any probability measure 6, and the limit exists and is equal to dim^ when 6 is exact 
dimensional, which is the case for self-similar measures [8]. 

We first consider the case that <I> is uniformly contracting, i.e. that all are equal 
to some fixed r. Fix a self-similar measure fi defined by some probability vector (pi)ieA 
and for i G A" write pi = ■ . . . ■ pi^ . Without loss of generality one can assume that 
belongs to the attractor X. Define the n-th generation approximation of fi by 

This is a probability measure on X and weakly. Moreover, writing 

n = nlog2(l/r). 
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closely resembles up to scale 2 = r" in the sense that 

lim ^H{u^"-\Vni) = dim^. 

n— ^-oo n 

The main question we are interested in is the behavior of u^'^'^ at smaller scales. Observe 
that the entropy H{iy^^\'Dn') of J^^"^ at scale 2"""' may not exhaust the entropy H{v^^'^) 
of as a discrete measure (i.e. with respect to the partition into points). If there 
is substantial excess entropy it is natural to ask at what scale and at what rate it 
appears; it must appear eventually because lim^^oo = H(i'^"'^). The excess 

entropy at scale k relative to the entropy at scale n' is just the conditional entropy 

Theorem 1.3. Let fi be a self-similar measure on M. defined by an IFS with uniform 
contraction ratios. Let z^*^") be as above. If dim fi < 1, then 

lim ^H{u^''\V„n'\Vn') = for every q> I. (6) 
We now formulate the result in the non-uniformly contracting case. Let 

^ = n 

ieA 

so that logr is the average logarithmic contraction ratio when (pi is chosen randomly 
with probability pi. Note that, by the law of large numbers, with probability tending 
to 1, an element i S A" chosen according to the probabilities pi will satisfy = 

^n(l+o(l)) ^ 2"'(1+°(1)). 

With this definition and z/^") defined as before, the theorem above holds as stated, 
but note that now the partitions T>k are not suitable for detecting exact overlaps, since 
(/9j(0) = ^Pj{0) may happen for some i,j G A" with ^ rj. To correct this define the 
probability measure z?^"^ on M x M by 

= 5(^,(0),.,) 

and the partition of M x R given by 

Vn = Vn X 

where T is the partition of M into points. 

Theorem 1.4. Let ^ be a self-similar measure on M and as above. If dim fi < 1, 
then 

lim = for every q>l. (7) 

n— >-oo n 

To derive Theorem ll.il let /i be as in the last theorem with dim fj, < min{l, s-dim fj,}. 
The conclusion of the last theorem is equivalent to ^H{fi,'Dqn') — )• dim/i for every q > 
1. Hence for a given q and all sufficiently large n we will have ^HiV^"^^ , Vq^i) < s-dim fj,. 
Since u^"-^ = X^ieA" Pi ' ^(vi(0),ri)i if each pair (pi{0), <fj{0) in the sum belonged to a dif- 
ferent atom of Vgn' then we would have ^//(z?^"), Pg„/) = - .^^^^^^^^.^ EjgA" Pi ^^^Pi = 
s-dim^, a contradiction. Thus there must be distinct i,j £ A"" for which (/?j(0),(/3j(0) 
lie in the same atom of T>qn', giving A„, < 2^'^"'. 
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1.3 Outline of the proof 



Let us say a few words about the proofs. For simplicity we discuss Theorem II. 3 [ where 
there is a common contraction ratio r to all the maps. For a self similar measure 
/i = X^jgA^'j ■ V'j^i iterate the relation n times to get /x = J2ieA"P-i- ' ^it^- Since each 
ipi, i G A", contracts by r", all the measures (fifi, i £ A", are translates of each other, 
so this can be re-written as a convolution 

where as before u^"'^ = YlieA" Pi ' ^ViW^ ^ translate of /i scaled down by r^. 

Fix q and write a ~ 6 to indicate that the difference tends to as n — )• oo. From 
the entropy identity = + -ff (/i, |P„') and the fact that 

H{ijL,Vn') ~ H{u^"-\'Dn'), we find that the mean entropy 

is approximately a convex combination A ~ (q+i) ^ ~^ (g+T) ^ °^ mean entropy 

n' 

and the mean conditional entropy 

^^^^^ gn 

where i/j"^ is the conditional measure of z^^") on I. Since A ~ dim and B ~ dim ^, 
we find that C ~ dim^ as well. On the other hand we also have ^/i(r("\ ~ 
dim/i, thus 

^//(i^f ) * r("),P(,+i)„,) = C p:. dim^ p. J_//(rW,p ^ ) (8) 

Q'n ^ ' qn ^ ' 

for large n and typical / G P„/. The argument is then concluded by showing that ([8| 
implies that either ^//(r^"'^ P^g^^-j^^) ~ 1 (leading to dim// = 1), or that typically 

« (leading to ®). 

Now, for a general pair of measures z^, r the relation ^H{h' * r, P^) ~ ^H{v,T>j^) 
analogous to (jS]) does not have such an implication. But, while we know nothing 
about the structure of we do know that t^^\ being self-similar, is highly uniform 
at different scales. We will be able to utilize this fact to draw the desired conclusion. 
Evidently, the main ingredient in the argument is an analysis of the growth of measures 
under convolution, which will occupy us starting in Section |2] 



1.4 Applications 

Theorem II . 1 1 and its corollaries settle a number of cases of the aforementioned conjec- 
ture. Specifically, in any class of IFSs where one can prove that cylinders are either 
equal or exponentially separated, the only possible cause of dimension drop is the 
occurrence of exact overlaps. Thus, 



5 



Theorem 1.5. For IFSs on M defined by algebraic parameters, there is a dichotomy: 
Either there are exact overlaps or the attractor X satisfies dimX = min{l, s-dimX}. 

Proof. Let (pi{x) = riX + Oj and suppose rj,aj are algebraic. For distinct i,j G A" the 
distance |v'i(0) — V'j(0)| is a polynomial of degree n in ri,ai, and hence is either equal 
to 0, or is > s" for some constant s > depending only on the numbers rj,ai (see 
Lemma IS.lOp . Thus A„ > and the conclusion follows from Corollary 11.21 □ 

There are a handful of cases where a similar argument can handle non-algebraic 
parameters. Among these is a well-known conjecture by Furstenberg from the 1970s 
on the linear images of the "one dimensional Sierpinski gasket". Let F C be given 
by 

{5^(^n,jn)3-" : (i„,i„)e{(0,0),(l,0),(0,l)}}, 
and let ttj : — t- M denote the linear map 

TTt{x,y) = tx + y. 

Then Ft = ntF is a self-similar subset of M defined by the contractions 

X I-?- ^x , X I-)- ^(x -I- 1) , xi-^^{x + t). (9) 

Furstenberg conjectured that dimvTfF = 1 for all irrational t (see e.g. [24' question 
2.5]). To relate this to our main conjecture, note that s-dimi<t = 1 for all t and that 
exact overlaps occur only for certain rational values of t. From general considerations 
such as Marstrand's theorem, we know that dimi^f = 1 for a.e. t, and Kenyon showed 
that this holds also for a dense Gs set of t [18] . In the same paper Kenyon also classified 
those rational t for which dim Ft = 1, and showed that Ft has Lebesgue measure for 
all irrational t (strengthening the conclusion of a general theorem of Besicovitch that 
gives this for a.e. t). For some other partial results see also |30) . 

Theorem 1.6. If t ^ Q then dim Ft = 1. 

Proof Fix t, and suppose that dim Ft < 1. Let A = {0, l,t} and (pi{x) = x/3 + i, so 
Ft is the attractor of {(pi}i£A- Let X„ = {X^ILi : aj G {±1,0}}. For each n and 

i,j G A" we have \ipi{0) — (pj{0)\ = Pij — t ■ qij for some Pij,qij G Xn, so there are 
Pn,Qn G Xn such that A„ = Ipn — t-qnl- Now, by Corollarv ll.2| \pn — t-Qn\ = < 30^" 
for all large enough n, which, since q^ ^ 3 gives \t — Pn/Qn\ ^ 10 ^. Subtracting 
successive terms, by the triangle inequality we have 

\Pn+l Pn , ^ r, 1 rv~n r i i 

I 1 < 2 • 10 tor large enough n. 

Qn+l Qn 

But pn,qn,Pn+i,qn+i G Xn, SO Pn+i/ qn+i-Pn/ Qn IS rational with denominator > 4-9""-, 
giving 

Qn+l Qn Qn+l Qn 

For large n the last two inequalities are incompatible unless Pn/qn = Pn+i/ Qn+i- In 
other words, there is an uq such that \t —Pno/Qno\ < lO"'^ for n > uq which gives 
t=Po/qo- □ 
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The argument above is due to B. Solomyak and P. Shmerkin and we thank them for 
permission to include it here. Similar considerations work in a few other cases, but one 
already runs into difficulties if in the example above we replace the contraction ratio 
1/3 with a general non-algebraic < r < 1 (see also the discussion following Theorem 
[19] below). 

In the absence of a resolution of the general conjecture, we turn to parametric 
families of self-similar sets and measures. The study of parametric families of general 
sets and measures is classical; examples include the projection theorems of Besicovitch 
and Marstrand and more recent results like those of Peres-Schlag |21) and Bourgain j^. 
When the sets and measures in question are self-similar we shall see that the general 
results can be strengthened considerably. 

Let / be a set of parameters, let : I — )■ ( — 1, 1) \ {0} and Oj : / — t- M, i G A. For 
each t G I define (fi^t : M — )• M by ipi^t{x) = r{t){x — ai{t)). For a sequence i E A" let 
<^i,t = ^h,t° ---o ^in,t and define 

Ay(t) = ^i,t{0)-^j,t{0). (10) 

The quantity A„ = A„(f) associated as in the previous section to the IFS {'^i.t}ieA 
is not smaller than the minimum of |Ajj(t)| over distinct i,j £ A" (since it is the 
minimum over pairs i,j with rj = rj). Thus, A„ — )• super-exponentially implies that 
min{| Ajj (t)| , i, j S A" , z 7^ j} — )• super-exponentially as well, so Theorem 11.11 has 
the following formal implication. 

Theorem 1.7. Let = {'fi,t} be a parametrized IFS as above. For every e > let 

= u n I u (A„)-^(-£",^")) (11) 

N=l n>N \i,jeA" J 

and 

E=P[E,. (12) 

£>0 

Then for t G I \ E, for every probability vector p = [pi) the associated self-similar 
measure of satisfies dim/x^ = min{l, s-dim/i^} and the attractor Xf of satisfies 
A\mXt = min{l, s-dimX^}. 

Our goal is to show that the set ii^ is a small. We restrict ourselves to the case that 
/ C M is a compact interval; a multi-parameter version will appear in [13j. Extend the 
definition of Ajj- to infinite sequences i,j G A^ by 

Ai,,(t) = lim Ai,...i„j,...j^(t). (13) 

n— >-oo 

Convergence is uniform over / and i, j, and if aj(-) and r(-) are real analytic, so are the 
functions A.jj(-). 

Theorem 1.8. Let I QM. be a compact interval, letr : I ^ 1)\{0} '^'^^^ : / — )• M 
be real analytic, and let = {Vi,t}ieA be the associated parametric family of IFSs, as 
above. Suppose that 

Vi,jeA^ {Aij=OonI i = j)- 

Then the set E of "exceptional" parameters in Theorem \l.l\ has Hausdorff and packing 
dimension 0. 
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Most existing results on parametric families of IFSs are based on the so-called 
transversality method, introduced by PoUicott and Simon |25) and developed, among 
others, by Solomyak [29] and Peres-Schlag [21]. Theorem 11.81 is based on a similar but 
much weaker "higher order" transversality condition, which is automatically satisfied 
under the stated hypothesis. We give the details in Section [5.41 See |28) for an effective 
derivation of higher-order transversality in certain contexts. 

As a demonstration we apply this to the Bernoulli convolutions problem. For < 
A < 1 let I'x denote the distribution of the real random variable Yln=o^^"'j where 
the signs are chosen i.i.d. with equal probabilities. The name derives from the fact 
that ux is the infinite convolution of the measures ^ (5_a" + <Ja")) = 0, 1, 2, . . ., but 
the pertinent fact for us is that i/x is a self-similar measure, given by assigning equal 
probabilities to the contractions 

ip±{x) = Xx±l. (14) 

For A < ^ the measure is supported on a self-similar Cantor set of dimension 
< 1, but for A € [^,1) the support is an interval, and it is a longstanding problem to 
determine whether it is absolute continuous. Exact overlaps can occur only for certain 
algebraic A, and Erdos showed that when A~^ is a Pisot number ux is in fact singular 
[5]. No other parameters AG [^,1) are known for which ux is singular. In the positive 
direction, it is known that i^x is absolutely continuous for a.e. A G [1/2, 1) (Solomyak 
[29]) and the set of exceptional A G [a, 1) has dimension < 1 — C{a — 1/2) for some 
C > (Peres-Schlag |21) ) and its dimension tends to as a —t- 1 (Erdos |6]). 

We shall consider the question of when dimi^^ = 1- This is weaker than absolute 
continuity but little more seems to be known about this question except the relatively 
soft fact that the set of parameters with dim vx = ^ is also topologically large (contains 
a dense Gs set); see [22j . In particular the only parameters A G [1/2,1) for which 
dimz^A < 1 is known are inverses of Pisot numbers (Alexander- Yorke [1]). We also 
note that in many of the problems related to Bernoulli convolutions it is the dimension 
of i^x, rather than its absolute continuity, that are relevant. For discussion of some 
applications see [221 Section 8] and |26] . 

Theorem 1.9. dimi^A = 1 outside a set of X of dimension 0. 

Proof Take the parametrization r{t) = t, a±{t) = ±1 for t G [1/2,1 — e]. Then 
~ Yli'^n — jn) ■ i"" and this vanishes identically if and only if i = j, confirming 
the hypothesis of Theorem 11.81 □ 

Arguing as in the proof of Theorem II. 6| in order to show that dim vx = ^ for all 
non-algebraic A, it would suffice to answer the following question in the affirmative: 

Question 1.10. Let n„ denote the collection of polynomial of degree < n with coef- 
ficients 0,zbl. Does there exist a constant s > such that for a, (3 that are roots of 
polynomials in n„ either a = f3 or \a — l3\ > s"'? 

Classical bounds imply that this is true if s grows linearly in n, but we have not 
found an answer to the question in the literature. 

Another problem to which our methods apply is the Keane-Smorodinsky {0,1,3}- 
problem. For details about the problem we refer to Pollicott-Simon [25] or Keane- 
Smorodinsky-Solomyak [17] . 
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Finally, our methods also can be adapted with minor changes to IFSs that "contract 
on average" [20]. We restrict attention to a problem raised by Sinai |23) concerning the 
maps i-p- : X I— )• (1 — a)x — 1 and 99+ : a; 1— t- (1 + oi)x + 1. A composition of n of these 
maps chosen i.i.d. with probability |, ^ asymptotically contracts by approximately 
(1 — a^)"/^, and so for each < a < 1 there is a unique probability measure //q, on 
M satisfying [i^ = ^f-fJ-a + \^+l^a- Little is known about the dimension or absolute 
continuity of beyond upper bounds analogous to (H]). Some results in a randomized 
analog of this model have been obtained by Peres, Simon and Solomyak |23) . We prove 

Theorem 1.11. dim/Xo, = min{l, s-dim /ia} for a G (0,1) outside a set of Hausdorff 
(and packing) dimension 0. 

For further discussion of this problem see Section 15.51 

1.5 Absolute continuity? 

There is another well-known conjecture, analogous to the one we started with, about 
the absolute continuity of self-similar measures ^ satisfies s-dim/x > 1. Specifically, 
it has been suggested that such measures should be absolutely continuous as long as 
there are no exact overlaps. The Bernoulli convolutions problem discussed above is a 
special case of this conjecture. 

Our methods at present are not able to address this. At a technical level, whenever 
our methods give dim/i = 1 it is a consequence of showing that H{fj,,'Dn) = n — o(n). 
In contrast, absolute continuity would require better asymptotics, e.g. H{fi,T>n) = 
n — 0(1) (see \12\ Theorem 1.5]). More substantially, our arguments do not distinguish 
between the critical (s-dim// = 1) and super-critical (s-dim^u > 1) phases, so in their 
present form they cannot possibly give results about absolute continuity. 

1.6 Notation and organization of the paper 

The main ingredient in the proofs are our results on the growth of convolutions of 
measures. We develop this subject in the next three sections: Section |2] introduces the 
statements and basic definitions. Section [3] contains some preliminaries on entropy and 
convolutions, and Section |4] proves the the main results on convolutions. In Section |5] 
we prove Theorem 11.11 and the other main results. 

We follow standard notational conventions. N = {1,2,3, . . .}. All logarithms are 
to base 2. V{X) is the space of probability measures on X, endowed with the weak-* 
topology if appropriate. We follow standard "big O" notation: Oa{f{n)) is an un- 
specified function bounded in absolute value by C ■ f{n) for some constant C = C{a) 
depending on a. Similarly o(l) is a quantity tending to as the relevant parameter 
— )• 00. The statement "for all s and t > t{s), . . ." should be understood as saying "there 
exists a function t(-) such that for all s and t > t{s),...". If we want to refer to a 
specific bound after the context where it is introduced we will designate it as ti(-), 
t2i-), t*(-), etc. 
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2 An inverse theorem for the entropy of convolutions 
2.1 Entropy and additive combinatorics 

As we saw at the end of Section Fl. 2 1 a key ingredient in the proof of Theorems 1 1.3 1 is an 
analysis of the growth of measures under convolution. This subject is of independent 
interest and will occupy us for a large part of this paper. 

It will be convenient to introduce the normalized scale-n entropy 

n 

Our aim is to obtain structural information about measures /i, v for which ^*z^ is small 
in the sense that 

ifn(/" * V) < Hnifi) + 6, (15) 

where 5 > is small but fixed, and n is large. 

This problem is a relative of classical ones in additive combinatorics concerning the 
structure of sets A, B whose sumset A + B = {a + b : a G A , b € B} is appropriately 
small. The general principle is that when the sum is small, the sets should have 
some algebraic structure. Such results are known as inverse theorems. For example the 
Freiman-Rusza theorem asserts that if + < C\A\ then A, B are close (in a manner 
depending on C) to (generalized) arithmetic progressions (the converse is immediate) jf| 
For details and more discussion see e.g |32j . 

With regard to entropy, in a recent paper Tao |31) obtained analogs of Freiman's 
theorem for the entropy of discrete measures, showing essentially that if 

Hnifi*fl)<Hn{fl) + 0{-) (16) 

n 

then /i, v are close, in an appropriate sense, to uniform measures on (generalized) 
arithmetic progressions. 

The condition (jlSp . however, is much weaker than (|16p and it is harder to draw 
conclusions from it about the global structure of /i (although some information can 
be obtained using the asymmetric Balog-Szemeredi-Gowers theorem). Consider the 
following example. Start with an arithmetic progression of length ni and gap ei, and 
put the uniform measure on it. Now split each atom x into an arithmetic progression 
of length n2 and gap £2 < £i/'^2i starting at x (so the entire gap fits in the space 
between x and the next atom). Repeat this procedure N times with parameters ni,ei, 
and call the resulting measure /x. Let k be such that e^r is of order 2~^. It is not hard 
to verify that we can have Hk{ij) = 1/2 but \Hi,{^) — //fc(/x*/i)| arbitrarily small. This 
example is actually a (generalized) arithmetic progression, as predicted by Freiman- 
type theorems, but the rank can be arbitrarily large. Furthermore if one conditions 
/i on an exponentially small subset of its support one gets another example with the 
similar properties that is quite far from a generalized arithmetic progression. 

Our main contribution to this matter is Theorem 12.71 below, which shows that 
constructions like the one above are, in a sense, the only way that (|15p can occur. We 



'a generalized arithmetic progression is an afline image of a box in a higher-dimensional lattice. 
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note that there is a substantial existing hterature on the growth condition |A + i?| < 
1^1^"'''^, which is the sumset analog of [151 Such a condition appears in the sum-product 
theorems of Bourgain-Katz-Tao [3] and the work of Katz-Tao [16], and in the Euclidean 
setting more explicitly in Bourgain's work on the Erdos-Volkmann conjecture |i2j and 
Marstrand-like projection theorems [3]. However we have not found a result in the 
literature that meets our needs and, in any event, we believe that the formulation 
given here will find further applications. 



2.2 Component measures 

The following notation will be needed in as well as M. Let = Vn x . . . x "D.^ 
denote the dyadic partition of W^; we often suppress the superscript when it is clear 
from the context. Let T)n{x) G P„ denote the unique level-n dyadic cell containing x. 
For D &Vn with n{D) > 0, let To ^ be the unique homothety mapping D to 
[0, l)'^, and To/i the push-forward of fi through Tp. 

Definition 2.1. For /i G V(M'^) and a dyadic cell D with fj,{D) > 0, the (raw) D- 
component of /i is 

1 



and the (rescaled) D-component is 



For x G R'^ with > we write 

fJ'X,n = fJ-Vnix) 
^x,n ^ ^V„ix)^ 

Taken together as x ranges over the support of fi, these are the level-n components of 

Our results on the multi-scale structure of ^ G M"* are stated in terms of the behavior 
of random components of jj, defined as follows Jl 

Definition 2.2. Let fi G P(M^). 

1. A random level-n component, raw or rescaled, is the random measure or /j,^, 
respectively, obtained by choosing D G T>n with probability IJ.{D); equivalently, 
the random measure fix,n or fi^'"', respectively, with x chosen according to fi. 

2. For a finite set / C N, a random level-/ component, raw or rescaled, is chosen 
by first choosing n G / uniformly, and independently choosing a raw or rescaled 
level-n component, respectively. 

Notation 2.3. When the symbols /x^'* and fj,x,i appear inside an expression P(. ..) or 
E(. . .), they will always denote random variables drawn according to the component 
distributions defined above. The range of i will be specified as needed. 



^Definition 12.21 is motivated by Furstenberg's notion of a CP-distribution [9l 1101 [M] , which arise 
as limits as --s- oo of the distribution of components of level 1, . . . , A''. These limits have a useful 
dynamical interpretation but in our finitary setting we do not require this technology. 
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The definition is best understood with some examples. For A C ^([0, l]'^) we have 

n<i<n (^"'' gA) = V / uifi^n df^ix). 

This notation implicitly defines x, i as random var iables. Thus if vIq,^!, • • • ^ '^([0, l]"') 
and D C [0, 1]*^ we could write 



Po<i<n (a*^'' G Ai and x € D) = — - — fi (x : /i^'* G Ai and x e D) . 
Similarly, for / : P([0, l)"^) R we have 

n+k „ 



When dealing with components of several measures ^, z^, we assume all choices of com- 
ponents /i^'*, z^^'-' are independent unless otherwise stated. For instance, 

where as usual 1^ is the indicator function on A, so 1a (f^) = 1 ii OJ G A and otherwise. 
We use the same notation to average a real sequence, thus given a„, . . . , ffln+fe £ I^i 



We record one obvious fact, which we will use repeatedly: 
Lemma 2.4. For E P(M'^) and n e N, 

/U = Ei=„ {p.{'Di{x)) ■ . 

2.3 An inverse theorem 

We first introduce finite-scale approximations of delta-masses and of uniform measures. 

Definition 2.5. A measure fj, G V{[0, 1]) is e-atomic if there is an interval / = B^{x) 
of length 2e such that ^(M \ /) < e. 

Note that if /i G P([0, 1]) is 0(2-'")-atomic then Hm{f^) = 0{l/m). 

Definition 2.6. A measure fi G "^([0, l])is {e,m) -uniform if HmifJ-) > 1 — £■ 

Now, the approximate equality Hn{fJ,*i') ~ Hn{fi) occurs trivially if either /i is close 
to uniform, or if i' is close to atomic. As we saw in Section 12.11 there are other ways 
this can occur, but the following theorem shows that locally (for typical component 
measures) the two trivial scenarios are essentially the only ones. 
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Theorem 2.7. For every e > and integer m there is a 6 = 5(e, m) > such that for 
every n > n(e, 6, m), the following holds: if fi^f G 'P([0, 1]) and 

* i^) < Hn{^J) + 5, 

i/ien t/iere are disjoint subsets I,J'^ {1, . . . , n} mi/i |/ U J| > (1 — e)n, such that 

(/i^'* is {e,m) -uniform^ > 1 — e for k ^ I 
Pj^fc (i/^'* 2~"^ -atomic) > 1- e for k £ J. 

The proof is given in Section 14.41 The proof is effective, but the dependencies we 
obtain between 6, m, and n are very bad and certainly far from optimal. We do not 
pursue this topic. 

The alternatives in the theorem are not exclusive. To see this begin with a measure 
fj, G [0, 1] such that dim(/i * /i) = dim/i = 1/2, and such that \im Hn{p) = lim Hn{^ * 
fi) = (such measures are not hard to construct by elementary means, or can be 
adapted from the more elaborate constructions in |19| 127)). By Marstrand's theorem, 
for a.e. t the scaled measure iy{A) = ^J.{tA) satisfies dim = 1 and hence Hn{n*i') — )• 
1. But it is easy to verify that, as the conclusion of the theorem holds for the pair /i, ^, 
it holds for /i, as well. 

Note that there is no assumption on the entropy of but if Hn{i^) is sufficiently 
close to the conclusion will automatically hold with / empty, and if Hn{i') is not too 
close to then J cannot be too large relative to n (this follows from the comments 
after Definition 12.51 and Lemma [3.41 below) . We obtain the following useful conclusion. 

Theorem 2.8. For every e > and integer m, there is a 5 = 5{e,m) > such that 
for every n > n(e, (5, m) and every fj, G "^([0, 1]), if 

P„,,,„ {Hm{^^^n<^-e)>l-e 
then for every G 7^([0, 1]) 

Hn{l') > e =^ Hn{^i *u)> Hnifi) + S. 

Specializing the above to self-convolutions we have the following result, which shows 
that constructions like the one described in Section [2. II are, roughly, the only way that 
Hn{jj, * /^) ~ Hn{lj) can occur: 

Theorem 2.9. For every e > and integer m, there is a 6 = 6{e,m) > such that 
for every sufficiently large n > n^:{£, 6, m) and every /i G 'P([0, 1)), if 

HnifJ- * ^) < HnifJ-) + 5 
then there disjoint are subsets I,J'^ {0, . . . , n} with |/ U J| > (1 — e)n and such that 

(a*^'* {£,m) -uniform) > 1 — e for k £ I 
Fi=k (m"^'* is 2~"^ -atomic) > 1- e for k £ J. 

The theorems above hold more generally for compactly supported measures but 
the parameters will depend on the diameter of the support. It can also be extended to 
measures with unbounded support under additional assumptions, see Section [5.51 
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3 Entropy, atomicity, uniformity 

3.1 Preliminaries on entropy 

The Shannon entropy of a probabiUty measure /i with respect to a countable partition 
£ is given by 

H{^x,£) = -Y,^J^{E)\og^{E), 
EeS 

where the logarithm is in base 2 and log = 0. The conditional entropy with respect 
to a countable partition is 

where /xp = is the conditional measure on F. For a discrete probability 

measure fi we write H{fi) for the entropy with respect to the partition into points, and 
for a probability vector a = (ai, . . . , ak) we write 

H{a) = - ^ ai log Oi. 
We collect here some standard properties of entropy. 

Lemma 3.1. Let he probability measures on a common space, £,J^ partitions of 
the underlying space and a G [0, 1] . 

1. H(fi,£) > 0, with equality if and only if fi is supported on a single atom of £. 

2. If fi is supported on k atoms of £ then H{fi,£) < k. 

3. If T refines £ (i.e. F ^T^E^£ s.t. F C E) then H{n, T) > H{n, £) . 
4- If £V T = {E nF : E e £ , F e JF} denotes the join of £ and T , then 

H{ii, £yF) = H{ii, J^) + H{ii, £\J^). 

5. H{-,£) and H{-,£\T) are concave 

6. H{-,£) obeys the "convexity" bound 

In particular, we note that for G 'P([0, 1]'^) we have the bounds H{iJ,,T>m) < md 
and H{iJ,,Vn+m\T^n) < md. 

Although the function {ji,m) i-> H{fj,,Vm) is not weakly continuous, the following 
estimates provide usable substitutes. 

Lemma 3.2. Let G V{M.) and m G N. 

L Given m G N and fi G V{K) for some compact K C R, there is a neighborhood 
U C r{K) ofn such that \H{iy,Vm) - H{n,Vm)\ < Ci for ueU. 

2. If £,T are partitions and each E E T intersects at most C2 elements of T and 
vice versa, then \H{^ii,£) — H{^ii,F)\ < C2logC2. 
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3. If f,g : R'^ and \\f{x)-g{x)\\ < C32-™ for x e R'^ then \H{fn,Vm) - 
H{gfj,,T>m)\ ^ C3 where C3 depends only on k. 

4. If u{-) = + xo) then\H{n,Vm)-H{u,Vrr,)\<Ci. 

5. If Cr^^ < m'/m < C5, then \H(fi,T>jn) — H{fJ',T^m')\ ^ C'^i where C5 depends 
only on C5 and d. 

Recall that the total variation distance between /i, G V(R'^) is 

\\^ - u\\ = sup\n{A) - u{A)\, 
A 

where the supremum is over Borel sets A. This is a complete metric on V{R'^). 

Lemma 3.3. For any bounded Borel set K C R'^ and any m G N, the function V{K) — t- 
M, /i I— )• H{jjL,T)m) , is uniformly continuous in the total variation metric. 

3.2 Global entropy from local entropy 

Recall from Section 12.21 the definition of the raw ajiid. rc-scajlcd. components i^x,ni Z^^''^; 
and note that 

Vra) = Hifl^^n, Vn+m)- (17) 

Also, 

E*=n = [ -H{fl-^^,Vm)dfl{x) 

' J rn 

= — I H{flxri,T^n+m)d^i{x) 

m J 
m 

Lemma 3.4. For r > 1 and ji G r, r]'^) and integers m < n, 

Proof. The statement is equivalent to 

1 1 1 

Hnif^) = - E -^(z"' ^i+mm + 0{- + ^). 

n ^-^ m n n 

i=0 

At the cost of adding 0{m/n) to the error term we can delete up to m terms from the 
sum. Thus without loss of generality we may assume that n/m G N. When m = 1, 
iterating the conditional entropy formula gives 

n-1 

E H{fi, A+1 I Vi) = H{fi, Vn\Vo) = H{fi, Vn) - 0(log r) 

i=0 

(since n G V{[—r,rY') implies H{^,'Dq) = O(logr)), and the result follows on dividing 
by n. For general m, first decompose the sum according to the residue class of i mod m 
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and apply the above to each one: 

rt-l ^ ^ m-l /n/m-1 

— Pi+m I A) = ~ 5Z I ^(M)^(jk+l)m+p I ^fem+p) 

1=0 p=0 \ fe=0 



p=0 

Dividing by n, the result follows from the bound 



2m + log r 
< 



n 



which can be derived from the identities 

= H{fl,Vp) + H{fl,Vn+p\Vp) 

together with the fact that H{jjL,'Dp) < p + logr and HdijVrm+pll^rm) < and 
recaUing that < p < m. □ 

We have a similar lower bound for the entropy of a convolution in terms of convo- 
lutions of its components at each level. 

Lemma 3.5. Let r > and n,v E P{[—r, r]"^). Then for m < n G N, 

1 777 -I— locf' 

m n 

Proof. As in the previous proof, by introducing an error of 0{rn/n) we can assume 
that m divides n, and by the conditional entropy formula, 

n/m—l 

k=0 
n/m—l 

= XI H{lJ,*L',V(^k+l)m\'^km) +0{logr) 
k=0 

since /x * is supported on [— 2r, 2r]'^. Substituting the identity ji^v = ^i=k{fJ-x,i * i^x,i), 
and using concavity of entropy, 

n/m—l 

H{ll*U,Vn) = y^ H {^i=km{lJ'X,i*l^x,i),T^{k+l)m\T^km) +0{\ogr) 
fc=0 

n/m—l 

> y^ ^i=km{H{ll^^i*U^^i,VQ,^-^)^\Dkm)) +0{\ogr) 
k=0 
n/m—l 

= J2 ^i=km{H{fi'^''*i^'^'\Vm\Vo))+0{\ogr) 

k=0 

n/m—l 

= Yl ^i=km{H{ii'^''*iy'^'\V^) + 0{l))+0{\ogr) 

k=0 

n/m—l 

= Yl rn-E,=km{Hm{l^''^' +0{- +logr), 



k=0 
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where in the second-to-last equahty we use the fact that /i^'* * i^^'* is supported on [0, 2) 
and therefore meets 0(1) elements of Dq. Dividing by n, we have shown that 

n/m-l 

n ^-^ m n 

k=0 

Now do the same for the sum k = p to n/m + p for p = 0, 1, . . . , m — 1. Averaging the 
resulting expressions gives the lemma. □ 

3.3 Atomicity and uniformity of components 

The following technical results allow to pass from a measure to its component measures 
while preserving some of the concentration or uniformity properties of the original 
measure. 

Lemma 3.6. // G "PiiO, 1]) is e-atomic and 1 < m < log(l/e), then 
Fi=m (Ai""'* is 2"'e-atomic) > 1 - VOr^. 

Proof. By definition 1 — e of the mass of ^ is concentrated on an interval W of length 
2e. For D € Um write To for the surjective homothety D [0, 1)'' and = T^W. 
Take 6 = \/2''"e (note 6 < 1) and let £ C D„ denote the family of cells D such that 

MD\w) = ^,''{[o,l]\iw''))>5. 

It follows that 

e > /z([0, l]\W)>Y,KD\W) = Yl KD) ■ MD \W) > 6 ■ f,{uS), 
Dee Des 

so /i(Uf ) < e/5 = \/2-™£. Hence /i(U(Pm \ -S")) > 1 - V2-™e. Finally, ior D eVm\£ 
we have //^([0, 1] \ W^) < S and is an interval of length 2"^~^^e, which implies that 
//^ is 2™ e-atomic, and the conclusion follows. □ 

Lemma 3.7. If fJ- £ ^([0, 1]) is (e, n)-uniform then for every 1 < m < n, 

^o<i<n (^'^'* is [e' ,m) -uniform^ > 1 — e' 

where e' = ^Je + 0(^). In particular there is a subset I C {0, ...,n} with |/| > 
(1 — ^/e')n and 

Fi=fc (/^^'* is (e', m)-uniform) > 1 — Ve' for k £ I. 
Proof. By Lemma 13.41 we have 

IEo<^<n(i^„^(;""'*)) = i^n(/i) - O(-) > 1 - (e + 0(-)) = 1 - {e'f, 
— n n 

so the first statement follows by Markov's inequality. Let / denote the set of < 

k < n such that Fi=k{Hm{l^'''') > 1 - e') > I - Since Eo<i<n(i?m(M'''*)) = 

TTh '^k=o^^=k{Hm{^J'^''^)) , by Markov's inequality again, \I\ > (1 — \/e')n, as claimed. 

□ 

We also need a simple covering lemma. 

Lemma 3.8. Let I C {1, . . . ,n} and i be given. Then there is a subset I' I such 
that I cr + [OJ] and {i + [0,£]) H (j + [0,£]) = for distinct i,j G /'. 

Proof. Define /' inductively, starting with the least element of / and at stage k adding 
the least element of / not covered by the sets j + [0, i] for j already in /'. □ 



17 



4 Convolutions 



4.1 The Berry-Esseen theorem and an entropy estimate 

For fj, G V{R) let m{fi) denote the mean, or barycenter, of fi, given by 



and let Var(;u) denote its variance: 

Var(^) = I {x — m{pL)Y dfi{x). 



Recall that if fii, . . . , fx^ G 'P(K) then = /xi * . . . * //^ has mean m(/i) = Yli=i 
and Var(^) = I](Li Var(/ii). 

The Gaussian with mean m and variance cr^ is given by jj^^^2{A) = f^ip{{x — 
m)/a'^)dx, where (p{x) = \/27rexp(— The central limit theorem asserts that, for 
^1, /i2, • • • £ 7^(M'^) of positive variance, the convolutions * . . . * fi^ can be re-scaled 
so that the resulting measure is close in the weak sense to a Gaussian measure. The 
Berry-Esseen inequalities quantify the rate of this convergence. We use the following 
variant from [?]• 

Theorem 4.1. Let fii, . . . be probability measures on M with finite third moments 
Pi = J \x\^ dfii{x). Let fj, = fii * . . . * fif: and let 7 be the Gaussian measure with the 
same mean and variance as fi. Then for any interval / C 

IM^)-7(/)l<Ci- ^^='P' 



Var(//)3/2 ' 

where Ci = Ci{d). In particular, if pi <C andYai{pi) > c for constants c, C > then 



4.2 Multiscale analysis of repeated self-convolutions 

In this section we show that for any measure every 5 > 0, every integer scale 
m > 2, and appropriately large /c, the following holds: typical levels-z components of 
the convolution p*^ are (5, m)-uniform, unless in p the level-i components are typically 
2~™-atomic. The main idea is to apply the Berry-Esseen theorem to convolutions of 
component measures. 

Proposition 4.2. Let cr > 0, > 0, and m > 2 an integer. Then there exists an 
integer p = pQ{a, 5, m) such that for all k > A;o(cr, 5, m), the following holds: 

Let pi, . . . , pk S 'P([0, 1]), let p = pi* . . .* pk, and suppose that Var(^) > ak. Then 

Pi=p-[logVfc] (Z^'"'' ('^' m)-umform) > I - 5. (18) 

Proof. It is a general fact that, for an absolutely continuous probability measure 7, for 
7-a.e. X, as p — 00 the components 7^'^ converge weak-* to Lebesgue measure on [0, 1], 
and in particular Ej=p(i?m(^^'*)) — s- 1 as p — )• 00. In general this is a consequence of 
the martingale convergence theorem or the Lebesgue differentiation theorem, and there 
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is no guaranteed rate of convergence, but if 7 has a continuous density function /, then 
convergence holds at every x for which f[x) > 0, and the rate depends only on f{x) 
and on the modulus of continuity of / at x. In particular for the family of Gaussians 
with mean and variance in a given compact interval convergence is uniform 

in X and in the measure. Therefore, given a,6 > there is a p = po{a, 6, m) such that 
Fi=p{Hm{'y^''^) > I — S) > 1 — 6 for any Gaussian 7 with Var(7) > a. 

Now, if and // are as in the statement and /i' is fi scaled by 2~['°s^] (which is up 
to a constant factor the same as then by the Berry- Esseen theorem (Theorem 14. ip 
n' agrees with the Gaussian of the same mean and variance on intervals of length 2~p~™ 
to a degree that can be made arbitrarily small by making k large in a manner depending 
on a, p. In particular for large enough A; this guarantees that Pj=p(ifm((/i')^'*) > 1—6) > 
1-6. 

All that remains is to adjust the scale by a factor of 2['°s^]. Then the same 
argument applied to fi instead of the scaled fi' gives ¥.^^_^^^^^^{Hm{{fJ')^'^) > 1 — 6) > 
1-6, which is (HH. □ 

We turn to repeated self-convolutions. 

Proposition 4.3. Let a,6 > and m > 2 an integer. Then there exists p = pi{a, 6, m) 
such that for sufficiently large k > ki{a,6,m), the following holds. 
Let /i G "^([0, 1]); fix an integer io > 0, and write 

X = Ei=i, (Var(/x^'^)) . 

If X > a then for j'q = io — [log ^/k] + p and v = fi*^ we have 

Pj=jo {i^^'"' is {6, m) -uniform) > 1 — 6. 

Proof. Fix A and m be given. Fix p and k (we will later see how large they must 
be) . Let io be as in the statement and jo = io — [log V^] + P- 

Let Jl denote the /c-fold self-product Jl = x . . . x and let vr : (M)'^ — t- M denote 
the addition map 

k 

7r(xi, . . . ,Xk) = y^^Xi. 

1=1 

Then u = njl, and, since Jl = Ei=ig {Jlx,i), we also have by linearity v = Mi=ig {irjlx^i). 
By concavity of entropy and an application of Markov's inequality, there is a (5i > 
0, depending only on 6, such that the proposition will follow if we show that with 
probability > I — 61 over the choice of the component Jlx,io of Jl, the measure r] = iTjlx,io 
satisfies 

¥j=jg {jf'^ is (Jl , m)-uniform) > 1 - 61. (19) 

The random component Jlx,iQ is itself a product measure Jlx^i = I^x^m x . . . x 
/^Xfc,io) s-iid the marginal measures /ixj.io of this product are distributed independently 
according to the distribution of the raw components of /i at level io- Note that these 
components differ from the re-scaled components by a scaling factor of 2*", so the 
expected variance of the raw components is 2~^*oA. Recall that 

k 

Var(7r(/x^.^,io x . . . x fix^^i^)) = ^ Var(/i^.^,,i J. 



19 



Thus for any > 0, by the weak law of large numbers, if k is large enough in a manner 
depending on 62 then with probability > 1 — 82 over the choice of Jj.x,io "^iU hav^ 

|i Var(7r/I,,iJ - 2-^'n\ < 2-^'^^62. (20) 

We can choose 62 small in a manner depending on a, so (|20p implies 

Var(^/I^,,,) > 2-2-^0.^^/2. (21) 

But now inequality (jlOp follows from an application of Proposition 14.21 with proper 
choice of parameters. □ 

Lemma 4.4. Any fi G ^([0,1]) is Vardj,)^/^ -atomic. Conversely, if fi is e-atomic, 
e < 1 then Var(^) < e. 

Proof. The second statement is trivial, the first is a simple consequence of Markov's 
inequality. □ 

Theorem 4.5. Let 6 > and m > 2 an integer. Then for k > k2{5,m) and all 
sufficiently large n > n2{5,m,k), the following holds: 

For any ^ S P([0, 1]) there are disjoint subsets I,J '^^ {l,...,n} with |/ U J| > 
(1 — 6)n such that, writing v = [i*^ , 

Pi=g (z^^'* is {6,m)-unifornij > 1 — 6 for q £ I 
¥i=g (/i^'* is 2'"" -atomic) >l-6 for q e J. 

Proof. Let 6 and m > be given, we may assume 6 < 1/2. 

Let p : (0, 1] —7- (0, 1] be a function such that p{a) is small in a manner depending on 
a, 6, m. We shall specify the exact requirements in the course of the proof, and one may, 
if desired, collect them to give an explicit formula for p. We note that the definition of 
yO uses the functions ki{-) and pi(-) from Proposition 14.31 and we assume, without loss 
of generality, that these functions are monotone in each of their arguments. 

Our first condition on p will be that p{cr) < a. Consider the decreasing sequence 
o"o > 0"! > . . . defined by ctq = 1 and CTj = p((Tj_i). Assume that k > ki{a[i_^_i/s2],5,m); 
this expression can be taken for k2{6,m). 

Fix p and n large, we shall later see how large an n is desirable. For < q < n 
write 

A, = Ei=g (Var(//"'^)) . 

Since the intervals ((Jj,c7j_i] are disjoint, there is an integer l<s<l + l/5'^ such that 
Po<g<n(Ag G ((Ts,(Ts_i]) < 5^. For this s define 

cr = as-i 

P = P{(^) = <^s, 

and set 

/' = {0<q<n:Xq>a} 

J' = {0<q<n:Xq<p}. 



®We use here the fact that we have a uniform bound for the rate of convergence in the weak law of 
large numbers for i.i.d. random variables Xi, X2, .... In fact, the rate can be bounded in terms of the 
mean and variance of Xi. Here Xi is distributed like the variance 'Va,T{^ix,io) of a random component 
of level io, and the mean and variance of Xi are bounded independently of ^ £ Pi[0, !])• 
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Then by our choice of s, 

|I'U J'l > {l-6^)n. 

We also write 

p = pi{a,5,m) 

and note that k > ki{a,S,m). 
Let £ > be the integer 

£ = [log Vk] — p. 

Since we may assume n large relative to I, by deleting at most i elements of /' we can 
assume that /' C [i,n] and that \I' U J'\ > (1 — still holds. Let 

1 = 1' -£ 

By our choice of parameters and the previous proposition, 

Fi=g (z^^''* is {6, ?7i)-uniform) > 1 — 6 for q £ I, 

and also 

Ei=g (Var(;u^'*)) = Xq < p for g e j'. 
By Markov's inequality, 

P,=g (Var(/i"'^) < Vp) > 1 - for e f. 

By Lemma 14.41 this implies 



i=q 



^x,t -g p^/^.sXom.\c \ > 1 - y/p for q e J' . 



Thus /, J' almost satisfy the conclusion of the theorem, except that they are not 
disjoint (even though J' were). To correct this, let 

^=[^(^ + 1)]- 

By Lemma 13.61 and Lemma 13.81 and assuming as we may that n is sufficiently large 
compared to m, we can find J" C J' such that J' C U<jej"[9i9 + -^li f^e union is 
disjoint, and 

^i=t (/i"-' is ^2^pV6.atomicj > (1 - ^)(1 - pi/i2) (22) 

forte U [q,q + L]. 

Let 

J = IJ [g + £ + i,g + L]. 

geJ" 

The union is disjoint and by definition of L and J' C |Jgej"[9i 9 + its size is 

|j|> (1-5^)1 u +^]i>(i- 

<?eJ" 
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Assume now that p{-) is such that ^2^^ < 2""^ and (1 - ^)(1 - /O^^^^) < 1 - 5^. 
Then from the above we have 

Pi^g (/i^'* is 2-™-atomic) > (1 - 5^) for J. 

We will be done if we show that Ir\J = %, since, using 5 <l/2 and \I' U J'\ > (1 — 5^)n 
this implies 

|/ U J| = |/| + I J| > |/'| + (1 - 5^)1/1 > (1 - U J'l > (1 - (52)2n > (1 - J)n. 

To see that / n J = 0, suppose t £ I Ci J. Let 

TT = P,=t+, (Var(^^'*) < I) 

Since t £ J, inequality (|22p holds for t. By Lemma 13.61 and the relation between 
atomicity and variance, if p{-) is suitably defined we will have vr > (2 — 2(t)/(2 — a), 
hence 

Xt+e = Ei=i+£ (Var(;u^'*)) < ^tt + {1 - tt) < a 

On the other hand t £ I means that t + i £ I' and we should have Xf^£ > a. This 
contradiction shows that I H J = and completes the proof. □ 

4.3 The Kaimanovich-Vershik-Tao lemma 

The second ingredient in the proof of Theorem 12.71 is the following: 

Lemma 4.6 (Kaimanovich-Vershik, [15j, Tao [31j). Let T be a countable abelian group 
and let £ "^ir) be probability measures with H{p) < oo, H{i') < oo. Let 

Then 5k is non-increasing in k. In particular, 

H{p * {u*^)) < H{p) + k ■ {H{p * 1^) - H{v)). 

This is the entropy analog of the Pliinnecke-Rusza inequality in additive combi- 
natorics, which states that if ^, i? C Z are finite sets then \A + B\ controls to some 
degree the growth of I^q + B®^\, where Aq A has size comparable to A. The result 
originates in a study by Kaimanovich and Vershik of random walks on groups and a 
version of it was recently rediscovered by Tao [31j. For completeness we give the proof. 

Proof. Let Xq be a random variable distributed according to /i, let Z„ be distributed 
according to i/, and let all variables are independent. Set X„ = Xq + Zi + . . . + Z„, so 
the distribution of X^ is just p*v*^ . Furthermore, since G is abelian, given Zi = g, the 
distribution of Xn is the same as the distribution of Xn-i + g and hence H{Xn\Zi) = 
H[Xn-i)- We now compute: 

H{Zi\Xn) = H{Zi,Xn)-H{Xn) 

= H{Zi) + H{Xn\Zi)-H{Xn) 

= H{u) + H{p*u<''-^'^)-H{p*u*''). (23) 
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Since Xn is a Markov process, given Xn, Z\ = X\ — is independent of X„_|_i, so 

R{Z\ I Xn) = I < H{Z\ I 

Using (|23p in both sides of the inequaUty above, we find that 

which is the what we claimed. □ 

For the analogous statement for the scale-n entropy of measures on M we use a 
discretization argument. For m £ N let 

k 

Mm = { — : keZ} 

denote the group of 2™-adic rationals. Each D G Drn contains exactly one x £ Mm- 
Define the m-discretization map am ■ ^ Mm by ami-c) = v ii T)m{x) = Dmiv), so 
that am{x) £ Vmix). 

We say that a measure /u G 'P(R'^) is m-discrete if it is supported on Mm, and for 
arbitrary fj, define its m-discretization to be its push-forward through the am, explicitly: 

^M= ^ f,{Vm(.v)) ■ 6,. 

Clearly Hm{^^) = ^^(/U^"'). 

Lemma 4.7. Given fii, . . . , fij. £ 'P(IR) with H{fii) < oo and m G N, 

\Hm{f^i * ^2 * • • • * Mfe) - * ... * ^^"''')| = 0{k/m). 

Proof. Let vr : M'^ — )■ M denote the map (xi, . . . , x^) i— t- X]i=i Then //i * . . . * = 
7r(^i X ... X /ifc) and * ... * = vr o amifJ-i x . . . x /Ufc) (here we extend 0"^, to 
(xi, . . . , Xfc) I—)- {amXi, . . . , amXk))- Now, it is easy to check that 

|7r(xi, . . . ,XA..) -'Koam{xi,...,Xk)\ = 0{k) 

so the desired entropy bound follows from Lemma 13.21 (j3|) . □ 

Proposition 4.8. Let £ 'P(M) with Hn{lj), Hn{v) < oo. Then 

Hnifl * {ly*'')) < Hnifl) + k ■ {Hnifl * v) - Hn{fl)) + O(-). (24) 

n 

Proof. Writing Jl = /i*^") and u = z^*^"). Theorem 14.61 implies 

H{J1 * {V*'')) < Hip) + k ■ {H{Jl * u) - H{V)). 

For n-discrete measures the entropy of the measure coincides with its entropy with 
respect to P„, so dividing this inequality by n gives (|24p for Jl,v instead of i^, and 
without the error term. The desired inequality follows from Lemma 14.71 □ 

We also will later need the following simple fact: 
Corollary 4.9. Form G N and fi,^ £ V{[—r,r]^) with f/'„(^), //„(z^) < oo, 

Hm{fi*u) > Hm{fi)-0{-). 

m 

Proof. This is immediate from the identity /x * i/ = J ^i* 5ydv[y)^ concavity of entropy, 
and Lemma 13.21 (|4| (note that 6y \s a. translate of //) . □ 
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4.4 Proof of the inverse theorem 

For convenience we recall the formulation of Theorem 12.71 

Theorem. For every e > and m > m[e), there exists a 6 = 6{e,m) such that for 
all n > n{e, m, 5), if v,^ G "^([01 1]) then either Hn{fi * J^) < Hn{n) + S, or there exist 
disjoint subsets I,JQ {0, . . . ,n} with |/ U J| > (1 — e)n and 

(/i^'* is {e , m) -uniformj > 1 — e for k £ I 
Fi=k (i^"^'* is 2~"^ -atomic) > 1- e for k £ J. 

Proof Fix e,m and choose k = k2{£,m) as in Theorem 14. 5 1 with 6 = e/2. We shall 
show that the conclusion holds if n is large relative to the previous parameters. 
Let fi,u £ P([0, 1)). Denote 

r = i/*^ 

Assuming n is large enough, Theorem 14.51 provides us with disjoint subsets I,J^ 
{0, . . . , n} with |/ U J| > (1 - £/2)n such that 



Pi=fc [t""'' is (-,m)-uniformj > 1 - 2 for G / (25) 

and 

¥i=k (z^""'' is 2-™-atomic) > 1 - | for G J. (26) 

Let Iq ^ I denote the set of k such that 

Wi=k ifJ''''' is {£, m)-uniform) >l-e for k el. (27) 

If |/o| > (1 — e)n we are done, since by (|26p and (12 7p . the pair Iq, J satisfy the second 
alternative of the theorem. 

Otherwise, let Ii = I \ Iq, so that = |/| — |/o| > £n/2. We have 

r^'* is ( — , m)-uniform and /i^'* is not (e, m)-uniform^ > — for k G Ii. 

For /i^'*,r^'* in the event above, this just means that Hmir^'^) > Hmin^'^) + e/2 and 
hence Hm{lJ-^'^ * t^'*) > -ffm(/^^'*) + e/2 — 0{l/m). For any other pair r^''* we have 
the trivial bound Hmil^'''' * r^'*) > -ffm(/^'''*) - 0(l/m). Thus, using Lemmas |331 ESI 
Ml 



Til 

Hn{t^*T) = Eo<i<„(F„(/."'**r^-*)) + 0(-) 

= i^iLE,g,^(i7„(^-.* * ryn) + !^±1^zMe^^j.(H^(^,-'^ * ry^) + 0(1 + 
n + 1 n + 1 ^ m 

> ^ (E,g,,(/?^(/x-'^)) + (i)2)) + ^ + ^ ~ '^^' E,g,.(g^(^-'-)) + 0(1 + 
n+lV 2/ n+1 ^ m 

= Eo<.<„(F„(/i"'')) + (J)3 + 0(l + -) 
— 2 m n 

2 m n 

So, assuming that e was sufficiently small to begin with, m large with respect to e and 
n large with respect to m, we have 
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On the other hand, by Proposition 14.81 above, 

Hni^i * t) = Hnifi * U*'') < Hnifi) + k • * u) - + O(-). 

n 

Assuming that n is large enough in a manner depending on e and k, this and the 
previous inequahty give 



WOk 

This completes the proof of Theorem 12.71 for 6 = e^/WOk. □ 

Theorems 12.81 and 12.91 are a formal consequences of Theorem 12. 7[ as discussed in 
Section 12.31 



5 Self-similar measures 

5.1 Uniform entropy dimension and self-similar measures 

The entropy dimension of a measure 9 G T'(M) is the limit lim„_>>oo Hn{0), assuming it 
exists; by Lemma [3.41 this is equivalent to lim„_j>oo Eo<i<n(-f^m(^^'*)) = ct for all integers 
m. However, the convergence of the averages does not imply that the entropies of the 
components 6^'^ concentrate around their mean, and examples show that they need 
not. We introduce the following stronger notion: 

Definition 5.1. A measure 9 G 'P(M) has uniform entropy dimension a if for every 
e > 0, for large enough m, 



liminf Po<i<n(|ffm(0"'') - a| < e) > 1 - e. (28) 

n— >oo 

Our main objective in this section is to prove: 

Proposition 5.2. Let /i G 'P(M) be a self-similar measure and a = dim//. Then ji has 
uniform entropy dimension a. 

For simplicity we first consider the case that all the contractions in the IFS contract 
by the same ratio r. Thus, consider and IFS ^ = {(/?i}jgA with ipi{x) = r{x — ai), 
< r < 1. We denote the attractor by X and without loss of generality assume that 
G X C [0, 1], which can always be arranged by a change of coordinates and may be 
seen not to affect the conclusions. Let fj, = '^i^\Pi ■ fifJ- be a self-similar measure and 
as usual write ipi = ipi^ o . . . o ip^^ and Pi = Pi^ ■ . . . ■ pi^ for i G A". 

Let 

a = dim fi 

As we have noted already, self-similar measures have entropy dimension: 

lim Hnin) = a (29) 

n— >-oo 

Fix X £ X and define probability measures 

/^L"fc = c • ^ • V^j/^ : i e A" , ipiX G Pfc(x)} , 

where c = c{x,x, k,n) is a normalizing constant. Thus differs from fix,k in that, 
instead of restricting fi = Yli^\n.Pi ■ (piU to 2?fc(x), we include or exclude each term 
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in its entirety depending on whether (piX S T)}^[x). Since (pip may not be supported 
entirely on either ^^{x) or its complement, in general we have neither <^ px,k nor 

fnl In] ~ 

fJ'x,k ^ f^x k' ^ot^ t^^t definition of pl^. depends on the point x, but this will not 
concern us. 

For < p < 1 it will be convenient to write 

Kp) = [log p/ log r] , 

so p, r^^P^ differ by a multiplicative constant. Recall that ||-|| denotes the total variation 
norm, see Section [3.11 



Lemma 5.3. For every e > there is aO < p < 1 such that, for all k and n = £{p2 



i=k 



<e] > 1 - e. (30) 



Furthermore p can he chosen independently of x and of the coordinate system on M (so 
the same hound holds for any translate of p). 

Proof. It is elementary that if p is atomic then it consists of a single atom. In this case 
the statement is trivial, so assume p is non-atomic. Then0 given e > there is a 5 > 
such that every interval of length 5 has /i-mass < Choose an integer q so that 

r'^ < 6/2 and let p = r*. 

Let A: G N and £ = l{2~^), so that 2~^ - r < r^ < 2^^. Let i £ and consider those 
j G A'' such that (pijp is not supported on an element of T>k- Then ^pijp is supported on 
the interval J of length 6 centered at one of the endpoints of an element of . Since 
(fiP can give positive mass to at most two such intervals J, and Lpip{J) < for each 
such J, we conclude that in the representation pi = X^jeA? Pij ' (VijA*)) least 1 — 
of the mass comes from terms that are supported entirely on just one element of P^. 
Therefore the same is true in the representation p = YlueA^+i Pu ' VufJ'- The inequality 
(|30p now follows by an application of the Markov inequality. Finally, Since our choice 
of parameters did not depend on x and is invariant under translation of p and of the 
IFS, the last statement holds. □ 

Lemma 5.4. For e > 0, for large enough m and all k, 

P^=k (^^m(p"'*) >a-e)>l-£, 

and the same holds for any translate of p. 

Proof. Let e > be given. Choose < e' < e sufficiently small that ||z^ — < e' 
implies \H{v,'Dm) - H{v' ,'Dm)\ < e/2 for every z^, i/' E V{[0,1]'^) (Lemma [33|. Let p 
be as in the previous lemma chosen with respect to e'. Assume that m is large enough 
that \Hm{p') — a\ < e/2 whenever p' is p scaled by a factor of at most p [m exists by 
(f29l) and Lemma [32] ©). Now fix k and let I = i{p2'^). By the previous lemma and 
choice of e' , it is enough to show that ^H{p^^^,'Di^j^m) > ct — e/2. But this follows 

from the fact that p^^^ is a convex combination of measures pj for j G A^, our choice 
of m and and concavity of entropy. □ 



'^This is the only part of the proof of Theorem 11.31 which is not effective, but with a httle more 
work one could make it effective in the sense that, if lim inf — log A'"^ = M < oo, then at arbitrarily 
small scales one can obtain estimates of the continuity of /i in terms of M. 
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We now prove Proposition 15.21 Let < e < 1 be given and fix an auxiliary 
parameter e' < e/2. We first show that this holds for m large in a manner depending 
on e. Specifically let m be large enough that the previous lemma applies for the 
parameter e' . In particular for any n, 

n<i<n {Hmif^'^n >a-e')>l-e'. (31) 

By (|29p . for n large enough we have \Hn{n) — a\ < e' /2, so by Lemma l3.4| for large 
enough n we have 

Since Hm{fJ'^'^) > 0, the last two equalities imply 

n<i<n {Hmifi'^n <a + e") >l-e" 

for some e" that tend to with e' . Thus, choosing e' small enough, the last inequality 
and (|3ip give (|28p . as desired. 

When the contraction ratios are not uniform, ipi = rix + a^, some minor changes 
are needed in the proof. Given n, let A*^"-* denote the set of i G A* = IJm=i ^ such 
that ri < r'^ < rj, where j is the same as i but with the last symbol deleted (so its 
length is one less than i). This ensures that {?"i}jgyv(") within a multiplicative 

constant of each other (this constant is minjrj : j G A}). It is easy to check that A^") 
is a section of A* in the sense that every sequence i G A* with rj < has a unique 
prefix in A^"). Now define iJ^\ as before, but using ipi^ for i G A*^"), i.e. 

With this modification all the previous arguments now go through. 

Finally, let us note the following consequence of the inverse theorem (Theorem l2.8p . 

Corollary 5.5. For every < a < 1 and e > there is a 6 > and such that the 
following holds: If fJ- & "^([0) 1]) uniform entropy a, then for all large enough n and 
every u G ^([0, 1]), 

Hn{iy) >e =^ F„(/i * z^) > Hnii^) + S. 

Similar conclusions hold for dimension. 



5.2 Proof of Theorem Q 

We again begin with the uniformly contracting case, ipi = rx + Ui, and continue with 
the notation from the previous section, in particular assume that is in the attractor. 
Recall from the introduction that 

ieA" 

Define 

r(")(A) = ^(r-"^). 
One may verify easily, using the assumption G AT, that 



t("). (32) 
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As in the introduction, write 

n = [nlog(l/r)]. 

Thus r(") is /i scaled down by a factor of r" = 2~"' and translated. Using (|29p . Lemma 
13. 2[ and the fact that r(") is supported on an interval of order r" = 2 , we have 

lim ^H{u^'^\Vn') = lim ^H{fx,Vn') = dim// = a. 

n— >oo n ra— >oo 77,' 

Suppose now that a < 1. Fix a large q and consider the identity 
1 n' / 1 \ an — n' / 1 \ 

— H{fl,Vgn) = — • -H{fl,Vn') + -H{fl,Vgn\'Dn') 

qn qn \n' J qn \qn — n J 

q \n J q \qn — n 

The left hand side and the term ^H{^,'Dn') on the right hand side both tend to a as 
n — )• oo. Since r, q are independent of n we conclude that 

lim — -H{fj.,Vqn\'Dn') = a. (33) 

n-^oo qn — n 

From the identity u^^'^ = Ei=„/(fj^"'*) and linearity of convolution, 

Also, each measure f^"'* * r^"^ is supported on an interval of length 0(2^"') so 

* tW,P,„|P„0 - Hiu'^;^ * r("),P,„)| = 0(1). 

By concavity of conditional entropy (Lemma 13.11 (f5])). 

H{fi,V,n\Vn') = //(i^(")*r("),Pg„|P„0 

> E,=„, (i/(i.J;.)*r("),P,„|P,0) 

= E,=„, (/?(!.(;.) * r("),P,0) + 0(1), 

so by daai), 

lim sup (h{u^^} * t(") , Vqn)) < a. (34) 

Now, we also know that 

lim -J—H{t^^^ , P,„) = a, (35) 
rn-oo qn — n' 

since, up to a re-scaling, this is just (|29p (we again used the fact that r(«) is supported 



on intervals of length 2 ) . By Lemma 14. 9| for every component v. 



(n) 



^^F(z.(;/*tW,P,„) > ^^i/(rW,p ) + 0(^-^). 
gn — n' ^' qn — n' qn — n' 
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Therefore for every 6 > 0, 
lim ¥i—„' 



qn 



^i/(z.(5*rW,P,„)>a-<5) =1 



which, combined with (|34p . imphes that for every 6 > 0, 

1 



hm 



a 



< d 



qn — n 

and replacing a with the hmit in (|40p . we have that for all 5 > 0, 



lim Fi=n' 



qn — n 



qn — n 



< 5 



1. 



(36) 



Now let e > 0. By Proposition 15 . 2 1 and the assumption that a < 1, for small enough 
£, large enough m and all sufficiently large n, 



^n'<i<qn' [H^i^i^T 



<l-£] > '^n' <i<qn' 
> l-£ 



in)\x,i\ 



< a + e] . 



Choose 6 > smaller than the constant of the same name in the conclusion of Theorem 
12.81 Then, for sufficiently large n, we can apply Theorem 12. 81 to the components in 

the event in equation (|36p (for this we re-scale by 2"' and note that the measures f^*^, 
are supported on level-n' dyadic cells and r(") is supported on an interval of the same 
order of magnitude). We conclude that every component f^"^ in the event in question 

satisfies ^;^i/(f^"\ P<y„) < e, and hence by 

lim P,=„, (^—HU''^,Vgn) <e)=l. 
n-s>oo \qn — n' ^' ) 

Thus, from the definition of conditional entropy and the last equation, 

lim ^^/7(i^("),P,„|P„0 = lim ^-^E,=„, (W".\P,„)) 
n-s>oo qn — n n->oo qn — n' V ^' / 



qn 
lim E,- 



n— >oo 

< e. 



Since e was arbitrary, this is Theorem 11.31 

5.3 Proof of Theorem 11.41 (the non-uniformly contracting case) 

We now consider the situation for general IFS, in which the contraction of ipi is not 
constant. Again assume that is in the attractor. Let r = Hjg^ n' = log2(l/?') as 
in the introduction, and define before. Given n, let 

R^ = {ri : i£ A"}. 

Note that \Rn\ = O(nl^l). Therefore if(z?("), {M} x T) = O(logn), and consequently 
for all k 

H{V^^\Vk) = H{v^^\Vk) + O(logn). 
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Thus 

and our goal reduces to proving that for every q > 1, 

—H(u^'^\'DnJT>n')-^0 asn^oo. 
qn 

Furthermore, for every e > 

/?(zyW,Pg„|P(i_,)„0 = F(z.("),Pg„|P„0 - 0{en), 
so it will suffice for us to prove that 

limsup — 2?g„|2?(i_£)„) = 0(1) as e — ^ 0. 

n— >oo qn- 

Fix e>0. For t £ Rn let 

A"'* = {i G A" : = t} 

SO {p"'*}te_R„ is a probability vector. It will sometimes be convenient to consider i G A", 
i G A"'* and t G Rn as random elements drawn according to the probabilities pj, Pi/p'^'^, 
and p"'*, respectively. Then we interpret expressions such as ¥i,=\n(^A), PjgyYn,t(A) and 
¥t£R„{A) in the obvious manner, and similarly expectations. With this notation, we 
can define 



This a probability measure on M representing the part of v^"'^ coming from contractions 
by t; indeed, 

= Ei6^„(i.("'*)). (37) 

For t > let r^*) be the measure 

rW(A) = r(t.4) 

(note that we are no longer using logarithmic scale, so the measure that was previously 
denotedr^"^ is now r^^ "•*). We then have 

A. = Eie^„(z.("'*)*rW). (38) 

Fix e > 0. Arguing as in the previous section, using equation (|38p and concavity of 
entropy, we have 



1 

?i-s>oo qn — (1 — e)n' 



a = lim ^H{ll,Vqn\'D{i-e)n>] 



> lim sup -J ^^teRr. * rW,Pg„|P(i_,)„,)) • (39) 
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By the law of large numbers, 

lim P,6A. ( 2-(i+^)"' < ri < 2-(i-^)"'') = 1, 

or, equivalently, 

lim FteR„ ( 2-(i+^)"' < t < 2-^^-'^''') = 1. (40) 
Using Hk{fi) — )• a and the definition of t^^\ we conclude that 

lim WteR^ ( -J H{T^'\Vgn) > (1 - e)a) = 1. 

n-^oo \qn — [l — e)n' ) 



Also, since r^*^ is supported on an interval of order t, from (|40p . (|39p and concavity of 
entropy, 

a > limsup— — Etei?„E,=„, (i7(i/^"'*^ * TW,Pq„|P(i_,)„ 

= ^™™P n r)r>' ^^^^-^^=-' * ^^*^^<?n) ' (41) 

This is the analogue of Equation [M] in the proof of the uniformly contracting case and 
from here one proceeds exactly as in that proof to conclude that there is a function 
(5(e), tending to as e — )• 0, such that 

Now, using Equation (|37p and the fact that the entropy of the distribution {p^"'*^}tei?„ 
is o(n) as n — 7- oo, by Lemma [3.11 (|6]) one concludes that 

limsupi/(i^("),Pg„|P(i_,)„0 < <5(e), 

71— >00 

which is what we wanted to prove. 

5.4 Transversality and the dimension of exceptions 

In this section we prove Theorem 11.81 Let / C M be a compact interval for t € / and 
let ^>t = {<fi,t}i£A be an IPS, (yC'i,t(x) = ri{t){x - ai{t)). We define ipi^t and ri{t) for 
i G A" as usual, set Ajj(t) = (/'i,t(0) — (pj^t{0) when i,j G A" and for i,j G A^ define 
Ajj(t) = lim Aj^...j^ j-^. (t) (this is well defined since lim (0) converges, in fact 
exponentially, as n — )• oo). 

For i, J G A" or i,j G A^ let z Aj denote the longest common initial segment of 
and \i A j| its length, so \i A j| = mm{k : ^ jk} — 1. Let 

rmin = minmin|ri(t)|, 

ieA tei 

so < Train < 1- For a C'^-function F : / ^ M write F^P) = §^F, and 



In particular we write 



|F|L,= max max iF^PVt)!. 
^'^ pe{o,...,fc} te/ 



= nmx ||ri||^ ^ . 
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Definition 5.6. The family {^t}tGl is transverse of order k if rj(-),ai(-) are A;-times 
continuously differentiable and there is a constant c > such that for every n G N and 
distinct i,j€ A"", 

\/toeI 3pe{0,l,2,...,k} such that lA^^j {to)\ > c ■ \i A j\-P ■ ri;,j{to). (42) 

The classical notion of transversality roughly corresponds to the case fc = 1 in this 
definition, see e.g. |21| Definition 2.7]. Unlike the classical notion, which either fails or 
is difficult to verify in many cases of interest, higher-order transversality holds almost 
automatically. To begin with, let i,j S A" and observe that 

Aij(t) = ri^j{t)Aij{t), 

where, writing u, v for the sequences obtained from i,j after deleting the longest initial 
segment, 

Aijit) = Au,v{t). 

Differentiating p times, 

A,'5(t) = ^(r„^(t)-'-Ay(t)) 

- E(:)-|i(-.(')-')-ASr'(*). 

A calculation shows that 

Thus we have the bound 

|AS5(t)| = 0,,._„«, ( max^ (|z A • r,^,{t)-' • |Ag(t)|)) . 

Proposition 5.7. Suppose rj(-),aj(-) are real-analytic on I. Suppose that for i, j G A^ , 
Aij = on I if and only if i = j. Then the associated family {^t}tel transverse of 
order k for some k. 

Proof. First, for x G I we can extend rj,aj analytically to a complex neighborhood Ux 
of X on which |rj| are still bounded uniformly away from 1. Define Aj ^(z) as before for 
i,j E A" and z G Ux, and note that for i,j G A^ the limit Aij{z) = lim Aj^...j^j^...j^(2:) 
is uniform for z £ Ux- This shows that Aij(t) is also real-analytic on I 

~ (p) 

Given k, from the expression for A-^- above, we see that if c > and there exists 

to£ I such that |A^^^^^(to)| < c-|iAi|~P-riAj(to) for all < p < A;, then |A^^^^^(to)| < c' for 
all < p < k, where c' = Ok,R^{c). For each k choose Cfc > such that the associated 
satisfies < \/k. 

Suppose that for all k the family {^t} is not transverse of order k. Then by 
assumption we can choose n{k) and distinct £ A"(^\ and a point tj^ G I, 

such that |Ajf2)^^.(fc)(ifc)| < Ck ■ \i^^^ A • r^ffci^^-ffc) (t^) for < p < fc, and hence 

A^ffc) ■(k){'tk) ^ c^- Let M^'^^ and v^'^^ denote the sequences obtained from i^^^ and j'^^^ 
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by deleting the first \ symbols, so that the first symbols of u^^'^ and v^^'^ now 

differ and A„(fe)^^(fc) = \{k)^j{k). Hence we have 

|A^f,)_^(,)(tfc)| <4 < lA for all Q<p<k. (43) 

Passing to a subsequence k^, we may assume that tj.^ — )• to and that n^*^^) — )• 
u G and — )• v G (the latter in the sense that all coordinates stabilize 
eventually to the corresponding coordinate in the limit sequence). Note that u ^ 
because u^^^\v^''^^ differ in their first symbol for all i, hence so do u,v. It follows that 
A^{fc^) ^(fc^) — )• Au^v uniformly and that the same holds for p-th derivatives. Hence for 
all p > 0, using uniform convergence and (|43p . 




{tk,)\=0. 



But A 

u^v is real analytic so the vanishing of its derivatives implies A^^^ = on I, 
contrary to the hypothesis. □ 

We turn now to the implications of transversality. The key implication is provided 
by the following simple lemma. 

Lemma 5.8. Let k G'N and let F be a k-times continuously dijferentiable function on 
a compact interval J C M. Let M = and let < c < 1 be such that for every 

X G J there is ap d {0, . . . , /c} with \F^^'>{x)\ > c. Then for every < p < the set 

F^^{—p,p) C J can be covered by Ofc,M.| J|(l/c^) intervals of length < 2(p/c)^/^'° each. 

Proof. For brevity, we shall suppress dependence on the parameters k, M, \ J\, so through- 
out this proof, O(-) = Ofc,Af,|j|(-)- 

The proof is by induction on k. For A; = the hypothesis is that |i^'^''''(x)| = 
|-F(x)| > c for all x £ J, hence F~^{—p, p) = for < p < c = and the assertion 

is trivial. 

Assume that we have proved the claim for k — 1 and consider the case k. Let J' 
be a maximal closed interval in F~^[—c, c] and let G = F'\j'. Note that G satisfies the 
hypothesis for k — 1 and the same value of c and M, and ^JTIp < c/2^~^, so from the 
induction hypothesis we find that G~^{—^Jcp, can be covered by 0(l/c) intervals 
of length < 2{y/cp/ cY^'^'^ ^ = ^i^pjc)^^'^^ each. Let \J denote the union of this cover 
and consider the intervals J[ which are the closures of the maximal sub-intervals in 
J' \ \J . By the above, the number of such intervals J[ is < 0(l/c). Now, on each J[ 
we have \F'\ > ^Jcp^ so by continuity of F' either F' > ^Jcp or F' < —^fcp in all of J[. 
An elementary consequence of this is that J'- n F-^{-p,p) is an interval of length at 
most 2p/^/cp = 2^/pJc < 2(p/c)^/^'°. In summary we have covered J' Ci F~^{—p, p) by 
0(l/c) intervals of length 2{p/cy^^'^ each. 

It remains to show that there are 0(l/c) maximal intervals J' C F^'^[—c,c] as 
in the paragraph above. In fact, we only need to bound the number of such J' that 
intersect F~^(—p,p). For J' of this kind, if J' = J we are done, since this means there 
is just one such interval. Otherwise there is an endpoint a G J' with |-F(a)| = c. There 
is also a point b G J' with |-F(6)| < p < c/2^. Since \F'\ < M, we conclude that 
I J'l > |5 — a| > (c — p)/M > c/2M. Thus, since the intervals J' are disjoint, their 
number is < \ J\/ {c/2M) = 0(l/c), completing the induction step. □ 
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Let bdimX denote the upper box dimension of a set X, defined by 

log # min{^ : X can be covered by I balls of radius r} 

bdim A = limsup — ^— . 

r^o log(l/r) 

One always has dimX < bdimX. The packing dimension is defined by 

oo 

pdimX = inf {sup bdim X„ : X C I J 

n=l 

Note that dimX < pdimX, and 1" C X implies pdiml" < pdimX. 

Theorem 5.9. If{^t}t&l satisfies transversality of order k > 1 on the compact interval 
I, then the set E of "exceptional" parameters in Theorem has packing ( and hence 
Hausdorff) dimension 0. 

Proof. Write 

M = sup sup ||Ajj||^ ^ . 

That M < oo follows from A;-fold continuous differentiability of rj(-),Oj(-) and the fact 
that |rj| are bounded away from 1 on I. By transversality there is a constant c > 
such that for every t £ I, every n and all distinct i,j E A", 



Qp .... 
I— A,,,(t)| >c-\iA ■ r^^^ for some p e {0, . . . , fc}. 



In what follows we suppress the dependence on k,M,c and |/| in the O(-) notation: 

We may assume that c < 1 and k > 2. Let e < crmin/^k and fix n and distinct 
i,j G A". By the previous lemma, for all < p < c\i ^ 3\~^f^^in ^ and in particular 
for < /9 < cr^j„/(2n)'^, the set {t € / : |Ajj| < p} can be covered by at most 
0{{2n)^ /r^j^^) intervals of length 2{{2n)^ p/r'!^^^)^/'^^ each. Now set p = e"" (our choice 
of £ guarantees that p is in the proper range) and let i,j range over their < jAI"" different 
possible values. We find that the set 

E,,n= U (A,,,)-i 

can be covered by 0((2n)'^|A|"/r^-„) intervals of length < ((2n)'^e"/rJ^-„y/^'° . Now, 
E <Z Ei; where 

oo 

= u n (44) 

N=l n>N 

By the above, for each e and N we have 

bdimi fli..„l < hm 




n^oo log(((2n)%VCm)'/'') 

Q/ofc log(I^IAmm) s 
log{£/rmin) 

The last expression is o(l) as e — t- 0, uniformly in A^. Thus by (|44p . the same is true 
of E^, and E <^ E^ for all e, so E has packing (and Hausdorff) dimension 0. □ 

Theorem 11.81 now follows by combining Proposition 15.71 and Theorem 15.91 
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5.5 Miscellaneous proofs 

To complete the proof of Corollary 11.51 we have: 

Lemma 5.10. Let A be a finite set of algebraic numbers over Q. Then there is a 
constant < s < 1 such that any polynomial expression x of degree n in the elements 
of A, either x = or \x\ > s". 

Proof. Choose an algebraic integer a such that A C Q(a). Since the statement is 
unchanged if we multiply all elements of A by an integer, we can assume that the 
elements of A are integer polynomials in a of degree < d and coefficients bounded by 
N, for some q, N. Substituting these polynomials into the expression for x, we have an 
expression x = Ylk^o^^ct^ where E N and \nk\ < N. It suffices to prove that any 
such expression is either or > s" for < s < 1 independent of n (but depending on 
a, N). In proving this last statement we may assume that q = 1 (replace s by s^/^). 

Let a = ai,a2, ■ ■ ■ ,ad denote the algebraic conjugates of a and cJi, cr2, . . . , cxrf the 
automorphisms of Q(a), with dja = Oj. If x / then HiLi <^iix) G so 

d d n d n 

1 < in^i(^)l =x■Yl\Y,'^k(T^{x)^\ <x-'[l^nk\ai\'' <x - {n- N ■a'^^J'^, 

i=l i=2 k=0 1=2 k=0 

where Omax = max{|a2|, . . . , la^^l}. Dividing out gives the lemma. □ 

We finish with some comments on Sinai's problem, Theorem II. Hi We first state 
a generalization of Theorem 11.71 needed to treat families of IFSs that contract only on 
average. 

Suppose that for t G I we have a family = {v^i,t}ieA of (not necessarily con- 
tracting) similarities of M, and as usual write ipi^t = i^i,tUi,t + Let p he a fixed 
probability vector and suppose that for each t we have X^p^logrj < 0, i.e. the systems 
contract on average. One can then show that there is a unique probability measure fit 
on M satisfying /ij = YlieAPi ' fi,tl^t pOj . that H{fj,t,'Dm) < oo for every t and m, and 
that fj,t{[—R, R]) — 7- 1 as ii — 7- oo uniformly in t. Under these conditions one can verify 
the stronger property that for every t £ I we have 

\Hm{Ht) - Hmi{fit)[-R,R])\ = o(l) aS i? OO 

uniformly in t and m. 

Theorem 5.11. Let {^t)t£l, P> md as in the preceding paragraph. Let Jl denote 

the product measure on with marginal p, and suppose that A C is a Borel set 
such that Ji{A) > 0. Write 

^=n ( u n ( u 

e>0 \N=1 n>N \i,j£A 

Then dim/i^ = minjd, s-dim ^t} for every t £ I\E. Furthermore suppose that I CM is 
compact and connected, and that the parametrization is analytic in the sense of Theorem 

TE If 

yi,j £ A ( Aij = on L <;=^ i = j ) 

then the set E above is of packing (and Hausdorff) dimension at most k — 1, and in 
particular of Lebesgue measure 0. 
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The proof is the same as the proofs of Theorems ll.7l and ll.81 except that in analyzing 
the resulting convolution one must approximate /x^ by {fJ't)[-R,R] for an appropriately 
large R that is fixed in advance, with the scale n large relative to R. We omit the 
details. 

Let us see how this applies to Theorem II. Ill where (p—i^Q. (x) = (1 - a)x - 1 and 
'^i,aix) = (l+a)x+l for a S (0,l],andp= (1/2,1/2). It suffices to consider the system 
for a G [s, 1] for some s > 0. Let A be the set of i G such that '^n=i ~ ^1 < 
for n > N(5), where 5 > small enough to ensure that |</5i^,,,j^| < 1 when this condition 
holds, and N{5) large enough that J1{A) > 0; in fact we can make J1{A) arbitrarily close 
to 1, by the law of large numbers. It remains to verify for i,j £ A that Ajj vanishes 
on [s, 1] if and only if i = j. Note that for i £ { — 1, 1}", 

n 

^i,a{^) = 1 + (1 + ha) + (1 + iia){l + i2a) + . . . + JJ(1 + i^a). 

k=l 

Thus Aij is a series whose terms are of the form Ck,rn{^ — Ci)^{l + a)™ for some 
Ck,m £ {Oj=tl}i ^iid ^ = j if and only if all terms are 0. Furthermore, there is an uq 
such that if k + m > riQ and Cfc,m 7^ 0, then k > (1 — 6)m. Thus since s < q < 1 and 5 
was chosen small enough, the series converges uniformly on [s, 1], and furthermore there 
is an e > such that the series converges uniformly on some larger interval [s, 1 + e], 
and even in a neighborhood of 1 in the complex plane. Hence Ajj(-) is real-analytic 
on [s, 1 + e] and is given by this series. Now, ii i ^ j we can divide out by the highest 
power (1 - q)'=o that is common to all the terms (possibly /cq = 0), and evaluate the 
resulting function at a = 1. We get a finite sum of the form ^^-^ m)eu ^m,k'^"^ for some 
finite set of indices C/ G N'^ such that c^^k S for ik,m) £ U. Such a sum cannot 

vanish, hence by analyticity Aj^- ^ on every sub-interval of [s, 1 -|- e] as desired, and 
in particular Ajj ^ on [s, 1], as desired. 
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